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Abstract. In dynamic epistemic logic, actions are described using ac¬ 
tion models. In this paper we introduce a framework for studying learn- 
ability of action models from observations. We present first results con¬ 
cerning propositional action models. First we check two basic learnabil- 
ity criteria: finite identifiability (conclusively inferring the appropriate 
action model in finite time) and identifiability in the limit (inconclu¬ 
sive convergence to the right action model). We show that deterministic 
actions are finitely identifiable, while non-deterministic actions require 
more learning power—they are identifiable in the limit. We then move 
on to a particular learning method, which proceeds via restriction of 
a space of events within a learning-specific action model. This way of 
learning closely resembles the well-known update method from dynamic 
epistemic logic. We introduce several different learning methods suited 
for finite identifiability of particular types of deterministic actions. 


Dynamic epistemic logic (DEL) allows analyzing knowledge change in a sys¬ 
tematic way. The static component of a situation is represented by an epistemic 
model, while the structure of the dynamic component is encoded in an action 
model. An action model can be applied to the epistemic model via so-called 
product update operation, resulting in a new up-to-date epistemic model of the 
situation after the action has been executed. A language, interpreted on epis¬ 
temic models, allows expressing conditions under which an action takes effect 
(so-called preconditions), and the effects of such actions (so-called postcondi¬ 
tions). This setting is particularly useful for modeling the process of epistemic 
planning (see urn) : one can ask which sequence of actions should be executed 
in order for a given epistemic formula to hold in the epistemic model after the 
actions are executed. 

The purpose of this paper is to investigate possible learning mechanisms 
involved in discovering the ‘internal structure’ of actions on the basis of their 
executions. In other words, we are concerned with qualitative learning of action 
models on the basis of observations of pairs of the form (initial state, resulting 
state). We analyze learnability of action models in the context of two learn¬ 
ing conditions: finite identifiability (conclusively inferring the appropriate action 
model in finite time) and identifiability in the limit (inconclusive convergence to 
the right action model). The paper draws on the results from formal learning 
theory applied to DEL (see I11I13I12I ). 

Learning of action models is highly relevant in the context of epistemic plan¬ 
ning. A planning agent might not initially know the effects of her actions, so she 
will initially not be able to plan to achieve any goals. However, if she can learn 
the relevant action models through observing the effect of the actions (either 
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by executing the actions herself, or by observing other agents), she will eventu¬ 
ally learn how to plan. Our ultimate goal is to integrate learning of actions into 
(epistemic) planning agents. In this paper, we seek to lay the foundations for 
this goal by studying learnability of action models from streams of observations. 

The structure of the paper is as follows. In Section [T| we recall the basic 
concepts and notation concerning action models and action types in DEL. In 
Section [2] we specify our learning framework and provide general learnability 
results. In Section [3] we study particular learning functions, which proceed via 
updating action models with new information. Finally, in Section [4] we indicate 
how to lift our results from the level of individual action learning to that of 
action library learning. In the end we briefly discuss related and further work. 


1 Languages and action types 

Let us first present the basic notions required for the rest of the article (see |6I8I 
for more details). Following the conventions of automated planning, we take the 
set of atomic propositions and the set of actions to be finite. Given a finite set 
P of atomic propositions, we define the (single-agent) epistemic language over 
P, C ep i S (P ), by the following BNF: <j) ::= p | -></> | <j) A (j> \ K(f>, where p G P. 
The language C prop (P) is the propositional sublanguage without the K<f> clause. 
When P is clear from the context, we write C ep i S and C prop instead of C ep i S {P) 
and C prop (P), respectively. By means of the standard abbreviations we introduce 
the additional symbols —>, V, <->•, _L, and T. 

Definition 1 (Epistemic models and states). An epistemic model over a 
set of atomic propositions P is M. = (W, R , V), where W is a finite set of worlds, 
R C W x W is an equivalence relation, called the indistinguishability relation, 
and V : P —> P(W) is a valuation function. An epistemic state is a pointed 
epistemic model (A4,w) consisting of an epistemic model A4 = (W, R, V) and a 
distinguished world w G W called the actual world. 

A propositional state (or simply state) over P is a subset of P (or, equivalently, 
a propositional valuation v : P — > {0,1}). We identify propositional states and 
singleton epistemic models via the following canonical isomorphism. A proposi¬ 
tional state sCPis isomorphic to the epistemic model M = ({«>}, {(®> w )}- L) 
where V(p) = {u>} if p € s and V(p) = 0 otherwise. Truth in epistemic states 
(M,w) with M. = (W, R , V) (and hence propositional states) is defined as usual 
and hence omitted. 

Dynamic epistemic logic (DEL) introduces the concept of an action model 
for modelling the changes to states brought about by the execution of actions 
IS- We here use a variant that includes postconditions 03, which means that 
actions can have both epistemic effects (changing the beliefs of agents) and ontic 
effects (changing the factual states of affairs). 

Definition 2 (Action models). An action model over a set of atomic propo¬ 
sitions P is A = (E, Q,pre,post), where E is a finite set of events; Q C E x E 
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is an equivalence relation called the indistinguishability relation; pre : E —> 
£epis(P) assigns to each event a precondition; post : E —>• C prop {P) assigns to 
each event a postcondition. Postconditions are conjunctions of literals (atomic 
propositions and their negations) or T0 dom(A) = E denotes the domain of A. 
The set of all action models over P is denoted Actions(P). 

Intuitively, events correspond to the ways in which an action changes the epis- 
temic state, and the indistinguishability relation codes (an agent’s) ability to 
recognize the difference between those different ways. In an event e, pre(e) spec¬ 
ifies what conditions have to be satisfied for it to take effect, and post(e) specifies 
its outcome. 

Example 1. Consider the action of tossing a coin. It can be represented by the 
following action model (h means that the coin is facing heads up): 

A = ei: (T, h) e 2 : (T, ~^h) 

We label each event by a pair whose first argument is the event’s precon¬ 
dition while the second is its postcondition. Hence, formally we have A = 
(E,Q, pre, post) with E = {ei,e 2 }, Q is the identity on E, pre(e 1 ) = pre(e 2 ) = 
T, post(e 1 ) = h and post(e 2 ) = ~>h. The action model encodes that tossing the 
coin will either make h true (ei) or h false (e 2 ). 

Definitions (Product update). Let At = (W,R,V) and A = 

(E,Q, pre, post) be an epistemic model and action model (over a set of atomic 
propositions P), respectively. The product update of At with A is the epis¬ 
temic model M £g> A = (W',R',V'), where W' = {(w,e) G W x E | (At, in ) \= 
pre(e)}; R! = {((in, e), (v, /)) € W’ x W' \ wRv andeQf}; V'(p) = {(w,e) G 
W’ | post(e) |= p or ((A i,w) \= p and post(e) \f= -ip)}. For e G dom{A), we 
define At ® e = M ® (A \ {e}). 

The product update M. ® A represents the result of executing the action A in 
the state(s) represented by At. 

Example 2. Continuing Example [l] consider a situation of an agent seeing a 
coin lying heads-up, i.e., the singleton epistemic state At = ({w}, {w, u>}, V) 
with V{h) = {u>}. Let us now calculate the result of executing the coin toss in 
this model. 

• • 

M®A= (wi,ei): h («A,e 2 ): 

Here each world is labelled by the propositions being true at the world. 

We say that two action models Ai and A 2 are equivalent, written Ai = A 2 , 
if for any epistemic model At, At ® Ai±±At ® A 2 , where i± denotes standard 
bisimulation on epistemic models m 

1 We are here using the postcondition conventions from [7], which are slightly non¬ 
standard. Any action model with standard postconditions can be turned into one of 
our type, but it might become exponentially larger in the process m- 
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1.1 Action types 

We can identify a number of different action types. 

Definition 4 (Action types). An action model A = ( E,Q,pre,post ) is: 

— atomic if \E\ = 1. 

— deterministic if all preconditions are mutually inconsistent, that is, \= 
pre(e) A pre(f) —> _L for all distinct e, / £ E. 

— fully observable if Q is the identity relation on E. Otherwise it is partially 
observable. 

— precondition-free if pre(e) = T for all e £ E. 

— propositional if pre(e) £ C prop for all e £ E. 

— universally applicable if |= V e^E P re ( e )■ 

— normal if for all propositional literals l and all e £ E, pre{e ) |= l implies 
post(e) l. 

— with basic preconditions if all pre{e ) are conjunctions of literals (proposi¬ 
tional atoms and their negations). 

— with maximal preconditions if all pre{e ) are maximally consistent conjunc¬ 
tions of literals (i.e., preconditions are conjunctions of literals in which each 
atomic proposition p occurs exactly once, either as p or as ~>p). 

Some of the notions defined above are known from existing literature mm- 
The newly introduced notions are precondition-free, universally applicable, and 
normal actions, as well as actions with basic preconditions. Note that action 
types interact with each other, atomic actions are automatically both determin¬ 
istic and fully observable, and precondition-free actions can only be deterministic 
if atomic H 

In the remainder of this section we set a uniform representation of action 
models that we will later on use in learning methods. We also specify and justify 
the restrictions we impose on action models. 

Propositionality In this paper we are concerned with product updates of 
propositional states with propositional action models. Let s denote a propo¬ 
sitional state over P, and let A = ( E,Q,pre,post ) be any propositional ac¬ 
tion model. Using the definition above and the canonical isomorphism between 
propositional states and singleton epistemic states, we get that s <S> A is isomor¬ 
phic to the epistemic model (W 1 ,R',V'), where W' = {e £ E \ s |= pre(e)}, 
R' = {(e, /) £ W’ x W’ | eQf}, V'(p) = {e £ W’ \ post(e) \= p or (s |= 
p and post(e) _, p)}- If A is fully observable, then the indistinguishability of 
s ® A is the identity relation. This means that we can think of s (g> A as a set of 
propositional states (via the canonical isomorphism between singleton epistemic 
models and propositional states). In this case we write s' £ s® A to mean that s' 
is one of the propositional states in s <g) A. When A is atomic we have s <g> a = s' 
for some propositional state s' (using again the canonical isomorphism). 

2 The actions considered in propositional STRIPS planning (called set-theoretic plan¬ 
ning in 0) correspond to epistemic actions that are atomic and have basic post¬ 
conditions. 
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Example 3. Consider the action model A of Example |T] (the coin toss). It is a 
precondition-free, fully observable, non-deterministic action. Consider an initial 
propositional state s = {ft}. Then s (g) ^4. is the epistemic model of Example [2] It 
has two worlds, one in which ft is true, and another in which ft is false. So we have 
0, {ft} E s <g> A, i.e., the outcome of tossing the coin is either the propositional 
state where ft. is false (0) or the one where ft is true ({ft}). 

Basic preconditions and normality When preconditions are basic, pre- and 
postconditions are of the same simple normal form, they are conjunctions of 
literals. Below we show that any propositional action model can be turned into 
an action model having this normal form. We also show that we can ensure all 
action models to be normal. 

Proposition 1. Any propositional action model is equivalent to a normal action 
model with basic preconditions. 

Proof sketch. Take a propositional action model (E, Q, pre, post). We first make 
the preconditions basic in the following way. Take any event e E E with precon¬ 
dition <f>. Turn (j) into disjunctive normal form Vie/ A jejPij- Then replace e by 
a set of events ei, i E /, where post(ei) = post(e) and pre(ei) = / \j e jPij■ Each 
et is connected by a Q-edge to every event e was originally connected to. This 
is done for each event e G E. It is easy to see that the resulting action model 
(E', Q' , pre' , post') has basic preconditions and is equivalent to the original one. 

We now “normalise” post' into a new mapping post" in order to obtain an 
equivalent normal action model (E',Q' , pre' , post"). Note that since the action 
model (E 1 , Q' , pre' , post') has basic preconditions, the normality condition can 
be expressed in a particularly simple way: for all literals l and all e G E, if l is a 
conjunct of pre(e) then it is not a conjunct of post(e). For each event e G E, we 
now define post"{e) from post'(e) by deleting each conjunct of post'(e) which is 
also a conjunct in pre'(e). It is easy to see that this gives an equivalent action 
model: consider an event e and a literal l which is both a conjunct of pre'(e) and 
post'(e). Since l is a conjunct of pre'(e), l has to be true for e to occur. Since l is 
a conjunct of post 1 (e), l will also be true after the event e has occurred. Hence, 
the event e does not affect the truth value of l , and we get an equivalent event 
by removing l from the postcondition. 

In this paper we are only going to be concerned with propositional actions, and 
so, due to Proposition!]] we can restrict attention to normal actions having basic 
preconditions. 

Universal applicability The condition for being universally applicable intu¬ 
itively means that the action specifies an outcome no matter what state it is 
applied to. In this paper we will only be concerned with universally applicable 
action models. To understand the reason for this restriction consider the example 
of an action open-door with singleton action model (- <open A - docked, open), i.e., 
if the door is currently closed and unlocked, performing open-door will open it. 
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This action model does not specify what happens if an agent attempts open-door 
when the door is either already open or is locked. We can easily fix this by adding 
another event to the action model, (openVlocked, T), expressing that if one tries 
to open it when locked or already open, nothing happens. More generally, any 
action model A = (E,Q,pre,post) which is not universally applicable can be 
turned into a universally applicable action model by adding the following event: 
(-1 V ee £ p7'e(e),T). If an agent is learning results of an action, she should in 
any possible state be able to attempt executing the action, and hence the action 
model should specify an outcome of this attempt. For this reason, we require 
universal applicability. 

2 Learning action models 

First we will focus on learning an individual action, i.e., inferring semantics of a 
single action name. The semantics of an action name is an action model. In the 
following we will use the expressions action and action model interchangeably. 
Below we will first present general results on learnability of various types of 
action models, and then, in Section 0 we study particular learning methods and 
exemplify them. 

We are concerned with learning fully observable actions (action models). 
Partially observable actions are generally not learnable in the strict sense to 
be defined below. Consider for instance an agent trying to learn an action that 
controls the truth value of a proposition p, but where the agent cannot observe 
p (events making p true and events making p false are indistinguishable). Then 
clearly there is no way for that agent to learn exactly how the action works. 
The case of fully observable actions is much simpler. If initially the agent has 
no uncertainty, her “belief state” can be represented by a propositional state. 
Executing any sequence of fully observable actions will then again lead to a 
propositional state. So in the case of fully observable actions, we can assume 
actions to make transitions between propositional states. 

For the rest of this section, except in examples, we fix a set P of atomic 
propositions. 

Definition 5. A stream £ is an infinite sequence of pairs (s, s') of propositional 
states over P, i.e., £ £ (V(P) x V(P)) U . The elements (s, s') of £ are called 
observations. Let N := N + U {0}, let £ be a stream over P, and let s,t £ V(P). 
£ n stands for the n-th observation in £. £[n\ stands for the the initial segment 
of £ of length n, i.e., £q, ... , £ n -i. set(£) := {(x,y) \ (x,y) is an element of £} 
stands for the set of all observations in £; we similarly define set{£[n\) for initial 
segments of streams. 

Definition 6. Let £ be a stream over P and A a fully observable action model 
over P. The stream £ is sound with respect to A if for all {s,s') £ set(£), 
s' £ s (g) A. The stream £ is complete with respect to A if for all s C P and 
all s' £ s <g) A, (s, s') £ set(£). In this paper we always assume the streams to 
be sound and complete. For brevity, if £ is sound and complete wrt A, we will 
write: ‘£ is for A ’. 


Learning Actions Models: Qualitative Approach 


7 


Definition 7 (Learning function). A learning function is a computable L : 

(P(P) X P(P))* Actions(P) U {t}. 

In other words, a learning function takes a finite sequence of observations (pairs 
of propositional states) and outputs an action model or a symbol corresponding 
to ‘undecided’. 

We will study two types of learning: finite identifiability and identifiability in 
the limit. First let us focus on finite identifiability. Intuitively, finite identifiability 
corresponds to conclusive learning: upon observing some finite amount of action 
executions the learning function outputs, with certainty, a correct model for the 
action in question (up to equivalence). This certainty can be expressed in terms 
of the function being once-defined: it is allowed to output an action model only 
once, there is no chance of correction later on. Formally, we say that a learning 
function L is (at most) once defined if for any stream £ for an action over P and 
n, k £ N such that n/t, we have that L(£[n])=f or L(£[k])=f. 

Definition 8. Let X be a class of action models and A £ X, L be a learning 
function, and £ be a stream. We say that: 

1. L finitely identifies A on £ if L is once-defined and there is an n € N s.t. 
L(£[n ]) = A. 

2. L finitely identifies A if L finitely identifies A on every stream for A. 

3. L finitely identifies X if L finitely identifies every A £ X. 

f. X is finitely identifiable if there is a function L which finitely identifies X. 

The following definition and theorem are adapted from |15.|14|13J. 

Definition 9. Let X C Actions(P). A set Da C V(P) xP(P) is a definite finite 
tell-tale set ( DFTT ) for A in X if 

1. is sound for A (i.e., for all ( s , s') £ Da, s' £ s (g> A), 

2. Da is finite, and 

3. for any A 1 £ X , if Da is sound for A!, then A = A! . 

Lemma 1. X is finitely identifiable iff there is an effective procedure D : X —> 
P(P(P) x P(P)), given by A H > Da, that on input A produces a definite finite 
tell-tale of A. 

Proof. [=>■] Assume that X is finitely identifiable. Then there is a computable 
function L that finitely identifies X. We use that function to define D. Once the 
learning function L identifies an action A it has to give it as a definite output, 
and this will happen for some £[n]. We then set D(A) = set(£[n]). It is easy 
to check that such D(A) is a definite tell-tale set. [<*=] Assume that there is an 
effective procedure D : X —> P(P(P) x P(P)), that on input A produces a 
definite finite tell-tale of A. Take an enumeration of X an take any A £ X and 
any £ for A. We use D to define the learning function. At each step n £ N, L 
compares £[n] with D(Ai),..., D(A„). Once, at some step l £ N, it finds Ak 
such that D(Afc) C set(£[£]), it outputs Ak - It is easy to verify that then Ak = A. 
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In other words, the finite set of observations Da is consistent with only one 
action A in the class (up to equivalence of actions). D is a computable function 
that gives a Da for any action A. 

Theorem 1. For any finite set of propositions P the set of (fully observable) 
deterministic propositional actions over P is finitely identifiable. 

Proof. We use Lemma [Q and hence define D: D(A) = {(s, s') \ s ® A = 
s', where s,s' G V(P)}. Let us check that indeed D(_4.) is a DFTT for A. We 
need to show conditions 1, 2 and 3 of Definition [9] for D(A). 1: D(A) is sound for 
A, trivially. 2: D(A) is finite, because P is finite. 3: Let us take any propositional 
action A! such that D(A) is sound for A!. This means, by the definition of D 
above and the fact that A and A! are deterministic, that for all propositional 
states s, s' over P, if = s' then sCgiA' = s'. It follows that = s'® A 1 for 
all propositional states s, and hence A = A' (since A and A' are propositional). 
Finally, D is computable because P is finite. 

Example f. Theorem [l] shows that deterministic actions are finitely identifiable. 
We will now show that this does not carry over to non-deterministic actions, that 
is, non-deterministic actions are in general not finitely identifiable. Consider the 
action of tossing a coin, given by the action model A in Example [T] If in fact the 
coin is fake and it will always land tails (so it only consists of the event e 2 ), in 
no finite amount of tosses the agent can exclude that the coin is fair, and that 
heads will start appearing in the long run (that e\ will eventually occur). So 
the agent will never be able to say “stop” and declare the action model to only 
consist of e 2 - This argument can be generalised, leading to the theorem below. 

Theorem 2. For any finite set of propositions P the set of arbitrary (including 
non-deterministic) fully observable propositional actions over P is not finitely 
identifiable. 

Proof. Assume that the set of arbitrary propositional actions over A is finitely 
identifiable. Then there is a learning function L that finitely identifies it. Among 
such actions we will have two, A and A ', such that A! = A \ D(A')H Let us now 
construct a stream £ on which L fails to finitely identify one of them. Let the 
£ start with enumerating all pairs of propositional states that are sound for the 
smaller action, A' , and keep repeating this pattern. Since this is a stream for 
A' indeed the learning function has to at some point output an equivalent of A! 
(otherwise it fails to finitely identify A', which leads to contradiction). Assume 
that this happens at some stage n £ N. Now, observe that £{n] is sound with 
respect to A too, so starting at the stage n +1 let us make £ enumerate the rest 
of remaining pairs of propositional states consistent with A. That means that 
there is a stream £ for A on which L does not finitely identify A. Contradiction. 

3 For any action model A = ( E , Q, pre, post) and any subset E' C E we define a \ E' 
as the restriction of A to the domain E\ that is, a f E' = (E 1 , Q',pre',post') where 
Q' = Q n (E') 2 , pre' = pre 1 E' and post’ = post \ E 1 . 
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A weaker condition of learnability, identifiability in the limit , allows widening 
the scope of learnable actions, to cover also the case of arbitrary actions. Identifi¬ 
ability in the limit requires that the learning function after observing some finite 
amount of action executions outputs a correct model (up to equivalence) for the 
action in question and then forever keeps to this answer (up to equivalence) in all 
the outputs to follow. This type of learning can be called ‘inconclusive’, because 
certainty cannot be achieved in finite time. 

Definition 10. Let X be a class of action models and A £ X, L be a learning 
function, and £ be a stream. We say that: 

1. L identifies A on £ in the limit if there is k £ N such that for all n > k, 
L(S[n\) = A. 

2. L identifies A in the limit if L identifies A in the limit on every £ for A. 

3. L identifies X in the limit if L identifies in the limit every A £ X. 

4- X is identifiable in the limit if there is an L which identifies X in the limit. 

The following theorem is adapted from [2]. 

Theorem 3. For any finite set of propositions P the set of (fully observable) 
propositional actions over P is identifiable in the limit. 

Proof. The argument is similar to the proof of Theorem [T| Analogously to the 
concept of definite finite tell-tale set, we define a weaker notion of finite tell¬ 
tale set (FTT). Let P be a set of propositions and let X C Actions(P). A set 
Da C V(P) x P(P) is a finite tell-tale set (FTT) for A in X if: 1. Da is sound 
for A (i.e., for all (s, s') £ Da, s' £ s®Al); 2. Da is finite, and 3. for any A! £ X, 
if Da is sound for A', then A = A! (X, where X C dom(A'). 

Similarly to the argument for Lemma [Done can show that X is identifiable 
in the limit iff there is an effective procedure D : X —>• P(P(P) x P(P)), given 
by A H > Da, that on input A enumerates a finite tell-tale of A. We will omit 
the proof for the sake of brevity. 

Now it is enough to show that indeed such a function D can be given for the 
set of arbitrary (fully observable, propositional) actions over P. Define D(A) = 
{(s, s') | s' £ s ® A, where s,s' £ V(P)}. Let us check that indeed D(A) is a 
FTT for A. 1: D(A) is sound for A, trivially. 2: D(A) is finite, because P is finite. 
3: Let us take any propositional action A' such that D(A) is sound for A!. This 
means, by the definition of D above that for all propositional states s, s' over P, 
if s' £ s (g> A then s' £ s (g) A'. This implies that s <8> A is a submodel of s ® A! 
for all propositional states s, and hence that A is equivalent to a submodel of 
A! (since actions are propositional). 

Finally, again D is computable because P is finite. 

Having established the general facts about finite identifiability and identifia¬ 
bility in the limit of propositional fully-observable actions, we will now turn to 
studying particular learning methods suited for such learning conditions. 
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3 Learning actions via update 

Standard DEL, and in particular public announcement logic, deals with learning 
within epistemic models. If an agent is in a state described by an epistemic model 
At and learns from a reliable source, that (f> is true, her state will be updated 
by eliminating all the worlds where <f> is false. That is, the model At will be 
restricted to the worlds where (j> is true. This can also be expressed in terms of 
action models, where the learning of (f corresponds to taking the product update 
of At with the event model {f>, T) (public announcement of (f >). 

Now we turn to learning actions rather than learning facts. Actions are 
represented by action models, so to learn an action means to infer the action 
model that describes it. Consider again the action model A of Example [lj The 
coin toss is non-deterministic and fully observable: either h or —>h will non- 
deterministically be made true and the agent is able to distinguish these two 
outcomes (no edge between ei and e 2 ). However, we can also think of A as the 
hypothesis space of a deterministic action, that is, the action A is in fact deter¬ 
ministically making h true or false, but the agent is currently uncertain about 
which one it is. Given the prior knowledge that the action in question must be 
deterministic, learning the action could proceed in a way analogous to that of 
update in the usual DEL setting. 

It could for instance be that the agent knows that the coin is fake and always 
lands on the same side, but the agent initially does not know which. After 
the agent has executed the action once, she will know. She will observe either 
h becoming false or h becoming true, and can hence discard either ei or e 2 
from her hypothesis space. She has now learned the correct deterministic action 
model for tossing the fake coin. Note the nice symmetry to learning of facts: 
here, learning of facts means eliminating worlds in epistemic models, learning of 
actions means eliminating events in action models. 

In the rest of this section, all action models are silently assumed to be: fully 
observable, propositional, and universally applicable. Furthermore, we can as¬ 
sume them to be normal and have basic preconditions, due to Proposition QJ 

3.1 Learning precondition-free atomic actions 

We will first propose and study an update learning method especially geared 
towards learning the simplest possible type of ontic actions: precondition-free 
atomic actions. 

Definition 11. For any deterministic action model A and any pair of 

propositional states ( s,s'), the update of A with (s, s') is defined by 

A | (s, s') := A f (e € E \ if pre(e) \= s then s ® e = s’}. For a set 

S of pairs of propositional states, we define: A \ S := A f {e G 

E | for all ( s , s') G S, if pre(e) \= s then s <g> e = s'}. 

The update A | (s, s') restricts the action model A to the events that are con¬ 
sistent with observing s' as the result of executing the action in question in the 
state s. 
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Definition 12. The update learning function for precondition- 
free atomic actions over P is the learning function L\ defined by 
Li(£[n]) = Aj nit | set(£[n]) where A\ nit = ( E,Q,pre,post ) with 
E = {ip | ip is a consistent conjunction of literals over P}; Q is the iden¬ 
tity relation on E x E; pre{e) = T for all e € E; postpip) = ip. 

In Figure [T] we show a generic example of such update learning for P = {p, q}. 



Fig. 1 . On the left hand side for P = {p, q}, together with sets corresponding 

to possible observations. We have labelled each event e by post(e). On the right hand 
side the state of learning after observing So = ({?}, {p, ?})■ 


Theorem 4. The class of precondition-free atomic actions is finitely identifiable 
by the update learning function defined in the following way: 

{ Li(£[n]) if card(dom(Li(£[n]))) = 1 

and for all k < n, Lf pdate (£[k]) = f; 
t otherwise. 

Proof. Let A denote a precondition-free atomic action over P, and let £ be a 
stream for A. We show that pf pdate finitely identifies A on £. First note that 
^update 0 i^j V i 0U giy ( a t most) once-defined. Further, we need to show that for 
some n G N, L t f pdate (£[n\) = A. By definition of pf pdate 5 it is the case only if 
card{dom{A\ nit \ set(£[n]))) = 1 and (A\ nit \ set(£[n])) = A. 

Since A is atomic and precondition-free, it must consist of a single event of 
the form (T, ip). By definition of Aj nit (Definition [12]), this implies that there is 
an event e G dom(Aj nit ) such that A = A] nit \ {e}. Since £ is a stream for A, e 
is in A\ nit | set(£[n]). 
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By Theorem Q] we know that A is finitely identifiable, so, by Lemma [TJ there 
is a DFTT Da for A. Since £ is for A and Da is by definition finite and sound 
for A, there is n £ N such that Da C set(£[n]). By definition of DFTT and 
of A] nit we get that for all e' £ Aj nit such that e' ^ e there is (s, s') in £[n] 
such that s <g> e' ^ s', and hence e' is not in Aj nit | set(£[n]). To see why that 
is assume the contrary, i.e., that there is an e' £ Ainit, e' ^ e such that for all 
(s, s') £ £[n] we have s 0 e' = s'. Then Da is also sound for the singleton action 
model containing only e'. But this contradicts that Da is a DFTT for A (since 
all pairs of distinct events in A\ nit are inequivalent). 

Combining the above we get that A\ nit | set(£[n]) contains exactly one event, 
e, and hence A\ nit | set(£[n]) = A] nit f {e} = A, showing the required. 

3.2 Learning deterministic actions with preconditions 

We now turn to learning of action models with preconditions. First we only treat 
the case of maximal preconditions, then afterwards we generalise to arbitrary 
(not necessarily maximal) preconditions. 

Definition 13. The update learning function for deterministic action mod¬ 
els with maximal preconditions over P is the learning function L 2 defined by 
L 2 (£[n\) = A 2 init | set(£[n]) where A 2 mit = (E,Q,pre,post) with E = {(<p,ip) \ 
(f is a maximally consistent conjunction of literals over P and if is a conjunction 
of literals over P not containing any of the conjuncts of cp }; Q is the identity 
on E x E; pre{{(p,ip)) = (f; post ((0, if)) = if. 

Theorem 5. The class of deterministic action models with maximal precondi¬ 
tions is finitely identifiable by the following update learning function p ^ pdate . 

{ L 2 {£[n)) if for all e,e’ £ dom(L 2 (£[n])) 
if e± e', then pre(e) ^ prefe') 
and for all k < n, L, 2 Pdate (£ [fc]) = 

’ otherwise. 

Proof. Consider any event e = ( <f> , ip) in A. Its precondition cp is a maximally con¬ 
sistent conjunction of literals over P. Due to normality, its postcondition ip can 
not contain any of the conjunctions of (p. Hence e must be identical to one of the 
events of A 2 nit . In other words, A must be isomorphic to a restriction/submodel 

of A 2 

init ’ 

Let £ be a stream for A. We show that L,fP date finitely identifies A on £. 
^update obviongiy m ost) once-defined. Further, we need that for some 
n £ N, L u 2 pdate {£[n)) = A. By Theorem Q] we know that A is finitely identifiable, 
so, by Lemma [U there is a DFTT Da for A, and hence there is n £ N such that 
Da C set(£[n]). Firstly, note that all distinct e, e' £ dom(L 2 (£[n])) have distinct 
preconditions pre(e) pre(e!). Here a similar argument applies as in the proof of 
Theorem[4] This gives us that Ljj pdate i s guaranteed to give an answer. Secondly, 
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we have to show that for any k < n, if for all distinct e, e' G dom(L 2 (£ [fc])) it 
is the case that pre(e) ^ pre{e '), then L 2 (£[k}) = A. Take any k < n, there are 
two cases. If £[k\ does not include a DFTT for A, then there are two distinct 
e, e! € L 2 (£[k]) such that pre(e) = pre(e'). If £[k] includes a DFTT for A, 
then actions A! ^ A have been eliminated from A? nit by step k, and hence 
L 2 (£[k])=A. 

Example 5. Consider a simple scenario with a pushbutton and a light bulb. As¬ 
sume there is only one proposition p: ‘the light is on’, and only one action: 
pushing the button. We assume an agent wants to learn the functioning of the 
pushbutton. There are 4 distinct possibilities: 1) the button does not affect the 
light (i.e., the truth value of p); 2) it is an on button: it turns on the light 
unconditionally (makes p true); 3) it is an off button: it turns off the light un¬ 
conditionally (makes p false); 4) it is an on/off button (flips the truth value of 
p). If the agent is learning by update, it starts with the action model Af nit con¬ 
taining the following events: (p, T), (-ip, T), (p,-p), and (-p,p). Note that by 
definition Af nit does not contain the events (p, p) and (-p, -p), since they both 
have a postcondition conjunct which is also a precondition conjunct. Assume 
the first two observations the learner receives (the first elements of a stream £) 
are (0, {p}) and ({p}, 0). Since the agent uses learning by update, she revises her 
model as follows (cf. Definition fldl) : 


observation So : 


observation Si : 



Ai n n | So 


Ai n it | So | S\ 


Now the agent has reached a deterministic action model Af nlt | set(£[2]), and 
can report this to be the correct model of the action, cf. Theorem [5] Note that 
the two observations correspond to first pushing the button when the light is 
off (£o)> and afterwards pushing the button again after the light has come on 
(£i). These two observations are sufficient to learn that the pushbutton is of 
the on/off type (it has one event that makes p true if p is currently false, and 
another event making p true if currently false). 

Consider now another stream £' where the first two elements are (0, {p}) and 
({p}> {p})- Update learning will now work as follows: 


observation S' 0 : 


observation S ( : 



So 


Ai n it | Sq | Si 


This time the learner identifies the button to be an on button, again after only 
two observations. It is not hard to show that in a setting with only one proposi- 
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tional symbol p , any deterministic action will be identified after having received 
the first two distinct observations. 

Example 6. Consider learning the functioning of an ?r-bit binary counter, where 
the action to be learned is the increment operation. For i = 1,... ,n, we use the 
proposition C; to denote that the itli least significant bit is 1. Consider first the 
case n = 2. A possible stream for the increment operation is the following: 


(0, {ci}), ({ci},{c 2 }), ({c 2 },{c 2 ,ci}), ({c 2 ,ci},{0}), 


1 0 

|°|- 

->|° 

1 1 

1 0 

11 

M 1 

1 0 1 

11 

1° 

M 1 

1 1 

11 

11 

M 0 

[ 0 1 

C2 

Cl 

C2 

Cl 

C2 

Cl 

C2 

Cl 

C2 

Cl 

C2 

Cl 

C2 

Cl 

C2 

Cl 


Using the update learning method on this stream, it is easy to show that the 
learner will after the first 4 observations be able to report the correct action 
model containing the following events: (-ic 2 A -ici, Ci), (->c 2 A ci, c 2 A ->ci), (c 2 A 
-■Ci, ci), (c 2 A ci, -ic 2 A -ici). Note that since Af nit has maximal preconditions, 
the action model learned for an n-bit counter will necessarily contain 2™ events: 
one for each possible configuration of the n bits. If we did not insist on maximal 
preconditions, we would only need n + 1 events to describe the n-bit counter: 
(-i Ci A Ci -1 A Ci-2 A • • • A ci, Ci A -iCj_i A -i Cj_ 2 A • • • A -ici) for alii = 2,..., n, 
(— 'Ci, ci) and ( c n A • ■ • A Ci, -> c n A • • • A -ici). This means that there is room for 
improvement in our learning method. 

To allow learning of deterministic action models where preconditions are not 
required to be maximal we need a different learning condition. Consider learn¬ 
ing an action on P = {p} that sets p true unconditionally. With non-maximal 
preconditions, all of the following events would be consistent with any stream 
for the action: (T ,p), (~<p,p), (p, T). To get to a deterministic action model, the 
learning function would have to delete either the first or the two latter events. 
We can make it work as described in the following. 

For any action model A we define 

min(_4) = A \ {e \ there is no event e! ^ e with pre(e) |= pre(e')}. 

Furthermore, we define L 3 to be exactly like L 2 of Definition [13] except in the def¬ 
inition of E, (f> can be any conjunction of literals, not only maximally consistent 
ones. 

Theorem 6. The class of deterministic action models is finitely identifiable by 
the following update learning function £“ pdote . 

{ min(L 3 (£ [n])) if for all s £ V{P) there exists an s' s.t. 

(s, s') £ set(£[n]) and for all k < n, 
L^ pdate {£[k}) = t; 
j~ otherwise. 
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The proof of this theorem is left out. The theorem can be seen as a generalisation 
of Theorem [5] in that it allows the learner to learn more compact action models 
in which maximal consistency of preconditions is not enforced (on the contrary, 
by the way the min operator is defined above, the learner will learn an action 
model with minimal preconditions). For instance, in the case of the n-bit counter 
considered in Example 0 it can be shown that the learner will learn the action 
model with n + 1 events instead of the one with 2" events. 

4 Action library learning 

In this section we introduce action library learning. This is the type of learning 
most relevant in planning. A finite set of action names is available to the agent. 
In order to plan a sequence of actions towards a goal it is essential to know 
what the corresponding actions do. As most of the results in this section are 
straightforward generalizations of our previous results, for the sake of space we 
will omit all proofs. 

An action library corresponds to what is called a planning domain (and 
sometimes also an action library) in classical planning: a specification of the 
available actions and their action schemas. Action library learning is the learning 
problem where the agent is initially only given a set A of action names and has 
to learn the action library l : A —> Actions(P). That is, the agent initially only 
knows the names of the available actions, and it then learns the action models 
that correspond to those names. 

Definition 14 (Action library). Let P denote a set of atomic propositions, 
and let A denote a finite set, the set of action names. An action library over 
P, A is a mapping l : A —r Actions(P) (a mapping from action names into action 
models). If all actions in the codomain of l enjoy property X, then l is called an 
X action library (e.g., a deterministic action library l is one where all action 
models in the codomain of l are deterministic). 

Streams and learning functions for action libraries are defined similarly to 
the case of individual actions. Let P be a set of atomic propositions, and A a 
set of action names. A stream over P, A is an infinite sequence of triples (s, a, s') 
where s, s' are propositional states over P and a £ A. Notations £ n , £[n\, set(£) 
and set(£[n]) are defined similarly to Definition [5] Given a £ A, the a-substream 
of £ is given by £ a = {(s,s') | (s,a, s') £ set(£)}. Let l be an action library 
over P, A. A stream for l is an infinite sequence of triples (s, a, s') where s, s' are 
propositional states over P, a £ A and s' £ s ® 1(a). A library learning function 
over P, A is a mapping L : (V(P) x Ax V(P))* —> ((A —> Actions (P)) U {t})- 
Given a learning function L for individual actions over P (Definition 0), we 
define the induced library learning function L over P, A by 

1. L(£) := t if L(£ a ) = f for some a £ A. 

2. Otherwise, for all a £ A. L(£)(a) := L(£ a ). 

From Theorem [6] we then immediately get the following. 
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Theorem 7. The class of deterministic action libraries is finitely identifiable by 
the library learning function \_™P date (induced from ]ff vdate Q f Theorem 0). 

Example 7. Consider the electrical circuit below consisting of two switches (1 
and 2), a voltage source (left) and a light bulb (right). 


1 



2 


When both switches are closed, the light will be on, otherwise it will be off. 
Let proposition s,; denote that switch i is closed and let l denote that the light 
is on. Assume the available actions are flipi and flip2 that flip switch 1 and 
2, respectively. Consider an agent trying to learn how the switches and the 
circuit work. This agent then tries to learn an action library over {si,S 2 ,Z}, 
{flipi* flip 2 \- Given a stream £ for the action library, it can be shown that the 
learning function I jn> datc eventually return the following action library that 

describes it (note that it can be described in many equivalent ways). 

l(flipi) = ( -i SiA-'S2, SiA-iZ), ( _i SiAs2, s\Al), {SiA~'S2 , —‘SiA~<l) : (S1AS2, —‘SiA~<l) 
l(flip2) = ( -| SlA- , S2, S 2 A~< 1 ), ( _ 1 SiAS2, ~'S2A~'l), (siA->S2, S2AI), {s±AS2, —‘S 2 A~< 1 ) 

5 Conclusions and related work 

This paper is the first to study the problem of learnability of action models in 
dynamic epistemic logic (DEL). We provided an original learnability framework 
and several early results concerning fully observable propositional action models 
with respect to conclusive (finite identifiability) and inconclusive (identifiability 
in the limit) learnability. Apart from those general results, we proposed various 
learning functions which code particular learning algorithms. Here, by imple¬ 
menting the update method (commonly used in DEL), we demonstrated how 
the learning of action models can be seen as transitioning from nondeterministic 
to deterministic actions. 

Related work A similar qualitative approach to learning actions has been ad¬ 
dressed by [IS] within the STRIPS planning formalism. The STRIPS setting is 
more general than ours in that it uses atoms of first-order predicate logic for 
pre- and postconditions. It is however less general in neglecting various aspects 
of actions which we have successfully treated in this paper: negative precondi¬ 
tions (i.e., negative literals as precondition conjuncts), negative postconditions, 
conditional effects (which we achieve through non-atomic action models). We 
believe that the ideas introduced here can be applied to generalize the results of 
(18) to richer planning frameworks allowing such action types. It is also worth 
mentioning here that there has been quite substantial amount of work in relating 
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DEL and learning theory (see [11112] for overviews), which concerns a different 
setting: treating update and upgrade revision policies as long term learning meth¬ 
ods, where learning can be seen as convergence to certain types of knowledge 
(see m). A study of abstract properties of finite identiiiability in a setting 
similar to ours, including various efficiency considerations, can be found in m- 


Further directions In this short paper we only considered fully observable 
actions applied in fully observable states, and hence did not use the full ex¬ 
pressive power of the DEL formalism. The latter still remains adequate, since 
action models provide a very well-structured and principled way of describing 
actions in a logical setting, and since its use opens ways to various extensions. 
The next steps are to cover more DEL action models: those with arbitrary pre- 
and postconditions, and those with partial observability and multiple agents. 
As described earlier, partially observable actions are not learnable in the strict 
sense considered above, but we can still investigate agents learning “as much as 
possible” given their limitations in observability. The multi-agent case is par¬ 
ticularly interesting due to the possibility of agents with varied limitations on 
observability, and the possibility of communication within the learning process. 

We plan to study the computational complexity of learning proposed in this 
paper, but also to investigate other more space-efficient learning algorithms. We 
are also interested algorithms that produce minimal action models. For instance, 
if we allow action models that have event postconditions specified as mappings 
from propositions to formulas (as is standard in DEL), then the action library for 
the circuit of Example [7] can be described using only 2 events. However, learning 
such minimal action descriptions might turn out to be computationally much 
harder. Furthermore, we here considered only what we call reactive learning: the 
learner has no influence over observations. We would also like to study the case 
of proactive learning , where the learner gets to choose which actions to execute, 
and hence observe their effects. This is probably the most relevant type of learn¬ 
ing for a general learning-and-planning agent. In this context, we also plan to 
focus on consecutive streams: streams corresponding to executing sequences of 
actions rather than observing arbitrary state transitions. Our ultimate aim is 
to relate learning and planning within the framework of DEL. Those two cogni¬ 
tive capabilities are now investigated mostly in separation—our goal is to bridge 
them. 
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